Awhile back I've told you about the Simpson's paradox and it was surprising how easy it was to draw the false conclusion from data.
Today I will give you a deep insight of the common mistakes that's being made in interpreting statistical data by confusing correlation with causation. I'll show you example where data is correlated and why it's tempted to confuse correlation with causation. So both of those are words that start with a C and very frequently I read newspaper articles that deeply confuse both the relationship of correlation and causation--so let's dive in.
Suppose you are sick, and you wake up with a strong pain in the middle of the night. You so sick that you fear you might die, but you're not sick enough not to apply the lessons of my Statistics 101 class to make a rational decision whether to go to the hospital. And in doing so, you consult the titer.
You find that in your town, over the last year, 40 people were hospitalized of which 4 passed away. Whereas the vast part of the population of your town never went to the hospital, and of those, 20 passed away at home. So compute for me the percentages of the people who died in the hospital and the percentage of the people who died at home.
Now I offer these as a fictitious example – these are relatively large numbers. But what’s important to notice is the chances of dying in a hospital are 40 times as large than dying at home.
That means whether you die or not is correlated to whether or not you are in a hospital. So the chances of dying in a hospital are indeed 40 times larger than at home.
So let me ask the critical question. Shall you now stay at home, given that you are a really smart
0:30statistics student, can you resist the temptation
0:33to go to the hospital because indeed it
0:36might increase your chances of passing away.